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Abstract 

While gravitational waves have not yet been measured directly, data analysis 
from detection experiments commonly includes an upper limit statement. Such 
upper limits may be derived via a frequentist or Bayesian approach; the theo- 
retical implications are very different, and on the technical side, one notable 
difference is that one case requires maximization of the likelihood function 
over parameter space, while the other requires integration. Using a simple ex- 
ample (detection of a sinusoidal signal in white Gaussian noise), we investigate 
the differences in performance and interpretation, and the effect of the "trials 
factor", or "look-elsewhere effect". 

1 Introduction 

1.1 Upper limits 

In general, an upper limit is a probabilistic statement bounding one of several unknown parameters de- 
termining the observed data at hand. While it would be hard to derive general properties applicable in 
any possible data analysis context, we will for the illustration purpose consider a simple case here: a 
sinusoidal signal in white Gaussian noise. This example exhibits many similarities with commonly en- 
countered real-world problems, including the use of Fourier methods, nuisance parameters, trials factors, 
partly analytical and numerical analysis, etc., and we beUeve is general enough to yield valuable insights. 

1.2 The frequentist case 

The frequentist detection approach is based on some detection statistic d, which for given data is then 
used to derive a significance statement along the lines of "If the data were only noise (null hypothesis Hq), 
a detection statistic value > do would have been observed with probability p." (P{d > d^ \ Hq) = p). 
The probability p here is the p-value, and a low p-value is associated with a great significance. In the 
case of a non-detection, the statement then may be reversed to an upper limit statement "Had the signal 
amplitude been > A*, a larger detection statistic value (> do) would have been observed with at least 
90% probability" (P(d >do\A> A*)> 90%), where A* is the 90% confidence upper hmit (e.g. [LU). 

1.3 The Bayesian case 

In the Bayesian framework, detection and parameter estimation are more separate problems; for detection 
purposes one would need to derive the marginal likelihood, or Bayes factor, which (in conjunction with 
the prior probabilities for the "signal" and "noise only" hypotheses Hi and Ho) allows one to derive the 
probability for the presence of a signal. The detection statement would then be "(Given the observed 
data y,) the probability for the presence of a signal is p." (F{Hi\y) = p). The upper limit statement on 
the other hand is a matter of parameter estimation; given the joint posterior distribution of all unknowns 
in the model, one would need to marginalize to get the posterior distribution of the parameter of interest 
alone. The upper limit statement would then be "(Given the observed data and the presence of a signal,) 
the amplitude is < A* with 90% probability." {V{A < A* \ y,Hi) = 90%) f^^. 



2 The data model 

We assume the data y to be a time series given by a parameterized signal s and additive noise n: 

yiU) = siU) + n{ti), (1) 

where i = 1, . . . , N and ti = iAt. The (sinusoidal) signal is given by 

s{t) = A sin(27r/t + </)), (2) 

where A > is the amplitude, < < 27r is the phase, and / G {j^^^ ■ ■ ■ i ivfc} frequency, 
where 1 < ji , . . . , jfc < -f- — 1 defines the range of possible (Fourier) frequencies. The number k of 
frequency bins may be varied and constitutes the so-called "trials factor" here. The noise n is assumed 
to be white and Gaussian with variance a^. 



3 Frequentist approach 

If there were no unknown parameters in the signal model, then, following from the Ney man-Pearson 
lemma, the optimal detection statistic would be given by the likelihood ratio of the two hypotheses. In 
the case that the hypotheses include unknowns (composite hypotheses) as in our case, this is commonly 
treated using the generalized likelihood ratio framework, that is, by considering the ratio of maximized 
likelihoods, where maximization is done over the unknown parameters Q. 

In our case, we have a 3-dimensional parameter space under the signal model. The conditional 
likelihood for a given frequency may be maximized analytically over phase and amplitude. The profile 
likelihood (maximized conditional likelihood for given frequency, as a function of frequency) is even- 
tually proportional to the time series' periodogram. The generalized likelihood ratio detection statistic 
then is given as the periodogram maximized over the frequency range of interest: 

(f := max ■j^\yj\^ (3) 

where yj is the (complex valued) jth element of the discretely Fourier transformed time series y. The 
term (the periodogram) maximized over in (|3j is in fact also the matched filter for a sinu- 
soidal signal |6|, and the maximum is commonly referred to as the "loudest event" [2 |. 

The detection statistic's distribution may be derived analytically under both hypotheses Hq and 
Hi, as this is a particular case of an extreme value statistic Under the null hypothesis, is the maxi- 
mum of k independently "distributed random variables; the cumulative distribution function (CDF) 
of is given by 

Frf2.^„(x) = P{d^<x\Ho) = {F^2{x))^ (4) 

where F^2 is the CDF of a distribution, and k again is the number of independent frequency bins, or 
"trials". This is essentially the "background distribution" of d^. Under the signal hypothesis Hi, is the 
maximum of (k—l) independently x^'distributed random variables and one noncentral-x^ (A) -distributed 
variable with noncentrality parameter A = j^A^. The corresponding CDF under Hi then is 

F^^.^hA^) = {F^.{x)f'''^ X F^.Jx) (5) 

where F^2 ^ is the CDF of a noncentral distribution with parameter A. 

For some observed detection statistic value dg, the (detection) significance is determined by the 
p- value F{d^ > d^ \ Hq) = p{d^\Ho) dd^ The 90% loudest-event upper hmit is given by the small- 
est amphtude value A* for which p{d^ \ A, Hi) dd^ > 90%, so that Pid"^ > I ^ ^ ^''^ -^^i) ^ 
90%. 
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Fig. 1: The integrals to be computed for a frequentist and a Bayesian 90% upper limit are very different. The 
Bayesian integral is computed along the vertical amplitude axis, conditioning on the observed detection statistic 
value (P = do- The frequentist integral goes along the horizontal axis of possible realisations of cP for any given 
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4 Bayesian approach 

We assume uniform prior distributions on phase, frequency, and amplitude. Given the (3 -dimensional) 
likelihood function Q, one can then derive joint and marginal posterior distributions P(^, c/), f \ y) and 
F{A\y). However, Monte Carlo simulations show that — in this particular model — the amplitude's 
marginal posterior distribution is virtually unaffected by whether one considers the complete data y, or 
only the "loudest event" (P. The essential information about the signal amplitude is contained in that 
loudest event, and the marginal amplitude posterior is dominated by the conditional distribution of the 
loudest frequency bin. We find that the main difference between the two kinds of limits in this model is 
not due to maximization vs. integration of the posterior; in the following we will therefore consider only 
the simpler, directly comparable, and more illustrative case of a Bayesian loudest event limit based on 
P(A|d2) instead of P(A|?/). 

Our relevant observable now is the "loudest event" (P. The likelihood function P{cP\A) was 
defined through dD in the previous section. The 90% upper limit on the amplitude is given by the 
amplitude A* for which p{A \d^,Hi)dA = 90%, so that P{A < A* \d^,Hi) = 90%. 

5 Comparison 

The likelihood function here is a function of two parameters: the observable cP and the amplitude param- 
eter A. Since the amplitude prior is assumed uniform, the posterior distribution is simply proportional to 
the likelihood, which allows for a nice comparison of both approaches. Fig. [T]illustrates the integrations 
performed for both the frequentist and the Bayesian upper limits for some particular realisation cP = dQ. 

Since the data y are reduced to a single observable d^, there also is a one-to-one mapping from d"^ 
to the upper limit A*. Fig. |2] shows both resulting upper limits as a function of the "loudest event" d^. 
An important feature to note is that the frequentist limit will be zero for certain values of d"^. The point 
at (and below) which this happens is the lower 10% quantile of the "background" distribution of 
under ffo ® — at this point the probability of observing a larger value is (by definition) 90% for 
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Fig. 2: The mapping from observable ({} to the upper limit on amplitude. The bottom panel shows the "back- 
ground" distribution of under Hq. (Example values here: N — 100, Af = 1, cr^ = 1, fc = 49.) 



zero-amplitude signals already, which makes zero the 90% upper limit. Note that this implies that if 
in fact is true, 10% of all 90% upper limits will be zero. Note also that this is consistent with the intended 
90% coverage of frequentist confidence bounds — if the upper limit is supposed to fall above and below 
the true amplitude value with 90% and 10% probabilities respectively, then 10% of the upper limits must 
be zero under Hq. 

Having the distribution of the detection statistic (equations (011, ^) and the mapping from to 
upper limit (Fig. |2]l allows us to derive the distribution of upper limits for given parameters. Figure |3] 
illustrates the behaviour of the resulting upper limits for different values of amplitude A and trials fac- 
tor k. The left panel shows that for large amplitudes the two limits behave roughly the same, as one could 
already see from Fig.|2] while for low amplitudes the posterior upper limit will level off and will not rule 
out amplitude values below a certain noise level. The frequentist limit's distribution on the other hand 
reaches all the way down to zero, and in particular the 90% limit's 10% quantile follows a straight line 
of slope 1 and intercept — the frequentist 90% limit is (by construction) essentially a statistic that has 
its 10% quantile at the true amplitude value. 

The right panel of Fig. |3] shows the differing behaviour of both limits as a function of the trials 
factor k when the true amplitude is zero. The frequentist limit's 10% quantile remains at zero (the true 
value), while the posterior limit is bounded away from zero but otherwise tends to yield tighter constraints 
on the amplitude, especially for large k. 

6 Conclusions 

The most obvious technical difference between frequentist vs. Bayesian upper limits is in maximization 
vs. integration over parameter space. This, however, is not — at least in the example discussed here — 
the primary origin of discrepancies between the two. When founding both limits on maximization (i.e., 
the "loudest event"), the behaviour of the Bayesian limit is affected very little; so the crucial information 
about the signal amplitude is in fact contained in the loudest event. Both kinds of upper limits behave 
very similarly for "loud" signals, i.e., a large signal-to-noise ratio (SNR), but their differences become ap- 
parent in the interesting case of (near-) zero amplitude signals. While the Bayesian upper limit expresses 
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Fig. 3: The distribution of upper limits as a function of amplitude (left panel) and trials factor (for zero amplitude; 
right panel). Note that the frequentist 90% limit is essentially a statistic that is designed to have its 10% quantile 
at the true amplitude value. 



what amplitude values may be ruled out with 90% certainty based on the data (and model assumptions), 
the frequentist upper confidence limit is defined solely through its "coverage" property. The frequentist 
90% limit needs to end up above and below the true amplitude value with 90% and 10% probability 
respectively, which simply means that the frequentist limit may be any random variable that has its 10% 
quantile at the true amplitude. This in particular imphes that for a true amplitude of A = the limit has 
a 10% chance of being zero as well, and it makes the frequentist limit very hard to actually interpret, 
not only if it actually happens to turn out as zero. When considering the effect of the trials factor (or 
look-elsewhere effect) in the low-SNR regime where both limits behave differently, the posterior-based 
limit will usually yield tighter constraints especially for large trials factors, but it will never be zero. 

The Bayesian upper limit based on the amplitude's posterior distribution will of course change 
with changing prior assumptions. For simplicity, we assumed an (improper) uniform amplitude prior 
here, but this should actually be a conservative choice in some sense, for a realistic prior in the continuous 
gravitational-wave context would in general be much more concentrated towards low amplitude values 
(something like the — also improper — prior with density p{A) oc ^). 

Another question is how exactly one would do the actual computations for a Bayesian upper limit 
in practice — the frequentist upper limits are usually not computed via direct analytical or numerical 
integration of the likelihood, but the integral (see Fig. \Q is determined in a nonparametric fashion via 
Monte Carlo integration and bootstrapping of the data. While the frequentist limit requires finding the 
amplitude A* at which the integral (P{(f > (IqIA = A*)) yields the desired confidence level, an 
analogous procedure to derive the Bayesian upper limit would probably require Monte Carlo sampling of 
P(d^|A) across the range of all amplitudes A in order to then do the integral in the orthogonal direction. 

Further complications arise especially for the frequentist limit when the signal model gets more 
complex. The general procedure required for the Bayesian upper limit is rather obvious — determine the 
marginal posterior distribution of amplitude F{A\y), then determine the 90% quantile. The frequentist 
procedure on the other hand may run into major problems. For example, if there are multiple parameters 
affecting the signal's SNR, a "loudest event" might be hard to define, or to translate into a constraint 
on the amplitude. As there may not be a simple one-to-one connection between SNR and amplitude 
parameter as in the present case, the "loudest event" may not be the only relevant figure to constrain the 
signal amplitude. The consideration of nuisance parameters is generally tricky in a frequentist framework 
and may effectively suggest the use of a Bayesian procedure instead |i8J. Computation also becomes more 
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Fig. 4: Illustration of the determination of a 90% detection sensitivity threshold. Such a statement would be 
independent of the observed data, and it requires the specification of an additional parameter: the corresponding 
false alarm rate defining the threshold of what is considered a "detection". (Here: N = 100, At — 1, — 1, 
k = 49.) 



complicated if the frequency parameter is not restricted to ("independent") Fourier frequencies. Note that 
the reasoning behind the generalized Ukelihood ratio approach (see Sec.O leading to the "loudest event" 
concept was very much an ad-hoc construction in the first place. 

Another notable related concept is that of a power constrained upper limit. In search experiments, 
these may be based on the sensitivity of the search procedure. In case the search yielded no detection, 
one can state the signal amplitude that would have been detected with 90% probability; this number may 
then also be used as a lower bound on the frequentist limit ("don't rule out what you wouldn't be able 
to detect"). However, this kind of statement requires the specification of another, additional parameter: 
the corresponding false alarm rate defining the threshold of what is considered a "detection", and as 
such is inseparably connected to the detection procedure (see also Fig.©. In particle physics a different 
approach is commonly taken; here the sensitivity is usually specified as the expected upper limit for 
many repetitions of the experiment in the absence of a signal. This figure would correspond to the solid 
lines at zero amplitude in Fig. [3] An important point to note is that both these sensitivity statements do 
not depend on the observed data. 
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